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(64) Information navigation system using clusterised information resource topology 

(57) An information navigation system is based on an information resource topology among information 
resources a-k in wliicii each information resource is associated witi) at least one term combination and a set of 
links, wliere eacli term combination specifies a §et Qf terms describing tlie information resource and eac)> link 
links information resources wit»» matcliing term combinations, and for every existing term combination, a set 
of information resources tiiat contain that combination form a duster A-C. virtiere a duster is defined as a set of 
information resources between which there exists at least one path which contains only information resources 
from the set of information resources and is defined as a series of information resources connected through 
links. The Information navigation functions, including gathering, searching, and topology managing, can be 
realized on this information resource topology. The system may be used for Internet. searching and. browsing 
purposes. Similar resources can be accessed using the links. 
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The present Invention relates to an information 
navigation system using a clusterized information resource 
topology, which will be called Ingrid topology in the 
following description, which is suitable for Internet 
Information Resource Discovery aiid Retrieval. 



The references enlisted below will be referred in the 
following description by a numeral In square brackets 
assigned at a top of each. 

[1] T. Berners-Lee. R. Fiedling. and H. Frystyk: 
-Hypertext Transfer Protocol-ffTTP/l.O" Internet Draft 
draft-letf-http-vlO-spec-04.html. IETF HTTP Working Group.. 
October 1995. 

[2] T. Berners-Lee. L. Masinter. and M. McCahill: "Uniform 
Resource Locators (URL)". Request For Comments rfcl738.txt. 
anonymous ftp from ds, lnternic.net/rfc. December 1994, 

[31 "Frequently Asked Questions About Lycos", URL 
http://lycos.cs.cmu.edu/reference/faq.html, Lycos Inc., 

1995. 

[41 "Netscape Navigator". URL http://www.mcom.com/. 
Netscape, 1995. 

[51 C. Welder. P. Faltstrom. R. Schoultz: "How to interact 
with a Whois**meshr. IETF Internet Draft draft-ietf-wnlls- 
whoi s -mesh- 01 . txt . anonymous ftp from 
ds.lnternic.net/lnternet-drafts. March 1995. 
[61 G. sal ton, J. Allan, and C. Buckley: "Automatic 
; Structuring and Retrieval of Large Text Files" . 
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Communications of the ACM. 37(2), February, 1994. 

[7 J R. Thompson, W. Croft, and R. Wolf: "A Network 
Organization Used for Document Retrieval". Proceedlnsrs of 
the 6th ACM SIGIR Conference, pp. 178-188, June 1983. 
5 [8 J P. Wlllett: "Recent Trends In Hierarchic Document 
Clusteringr: A Critical Review", Information Processing: and 
Management. 24(5) :577-S97, 1988. 

[9] R. Wright, A. Getchell, T. Howes, S. Satalurl, P. Yee, 
and W. Young: "Recommendations for an X.500 Production 
10 Directory Service", Request For Comments rfcl803.txt, 
anonymous ftp from ds. lnternlc.net/rfc. June 1995. 

The present Invention is relevant to the new and fast 
growing area of Internet Information Resource Discovery and 

15 Retrieval, also known as the Web. The Web can perhaps be 
best described as a global hypertext application running 
over the Internet. The Web includes tools for naming; and 
retrieving any Internet Information Resource (also referred 
hereafter as Just resource for short) , for authoring 

20 hypertext documents that point to those resources (and to 
other hypertext documents, which are themselves resources), 
for viewing the hypertext documents, and for searching 
resource collections. 

Two examples of popular Web software tools are.: 

25 (1) The World Wide Web (?nrw) , which includes global 

resource naming and locating (Uniform Resource Locators. 
URLs [2]), retrieval (HyperText Transfer. Protocol, HTTP 
[1]). and document authoring (HyperText Markup Language, 
HTML) . 

30 (2) Netscape, which is a hypertext-style, multi-media 

user interface to the Web [4]. 

The present invention is also relevant to the field of 

Information Retrieval (IR). IR is a mature technology area. 

Many extensive information retrieval services exist, such 
35 as the MEDLARS medical library search service in the 



-2- 



U.S.A., and JOIS (JICST (Japan Information Center of 
Science and Technology) Online System) In Japan. 

Recently, traditional (centralized) IR and hypertext 
systems are starting to merge functionally. In addition. IR 
5 systems that gather and index Web resources . such as Lycos 
[31. are now available over the Web. Thus, the three areas 
of hypertext. IR, and Internet Information Resource 
Discovery and Retrieval are merging into one. In the 
following, systems that implement these various 
10 technologies will be referred as Information Navigation 
Systems . 

Now. the limitations of the currently available 
technologies for Web Navigation and IR will be briefly 
described. 

15 

(1) Web Navigation 

The major problem with the current Web is the lack of 
ability to search and browse (collectively called navigate) 
the information in the Web. Note that searching and 

20 browsing are highly related activities. Searching can be 
described as looking for the resources that fit a 
particular description, such as a specific set of keywords, 
while browsing is a less focused "looking around". 

Currently, it is impossible to efficiently navigate 

25 all of the Web. Parts of the Web can be effectively 

navigated, for Instance by indexing some part of the Web 
and placing that index on a single computer, which can then 
be locally searched [3]. But this approach does not scale 
to global proportions. 

30 The basic problem is that information resources are 

multi-dimensional. Any given resource will relate to 
multiple topic areas. The interesting aspects of any given 
resource will be viewed differently by different people. In 
order to capture this multi-dimensionality (in the current 

35 state-of-the-art) , either (1) Information must be 
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maintained about Individual resources, or (2) Information 
must be maintained about groups of resources, where the 
information about each group accurately describes the 
resources that belong to the group. 
5 The amount of Information required for the former case 

will of course not scale to global proportions. Approaches 
along the lines of the latter case (for Instance, 
categorization hierarchies such as X.500 [9] or centrolds 
15]) tend either to result in groups that are too general 
10 and therefore don't accurately reflect their contents, or 
in too many overlapping groups, resulting in poor scaling. 

Currently there is no serious proposal for how to 
efficiently navigate all of the Web. 

15 (2) Information Retrieval and Hypertext 

IR systems allow a user to search for resources In a 
large database based on some description of -the desired 
resource, usually keywords. IR systems also allow searching 
through relevance feedback, by allowing a user to indicate 

20 a resource known to be similar to what is desired. 

IR has a long history of organizing resources into 
(usually hierarchical) groupings 17, 8]. This is done both 
to Improve search efficiency (if one relevant resource is 
found, all of the resources in the same group can be 

25 retrieved), and to improve search quality (resources in the 
group might not match the keyword description but might 
stiil be relevant) . 

Historically, each resource is grouped into a single, 
or at best, small number of groups. Doing so reduces the 

30 amount of memory needed to store the group information. 
Since Information, however, is multi-dimensional, this 
limited grouping is unlikely to effectively reflect all of 
the meaningful relations between documents. Indeed, the use 
of grouping has not been consistently successful, perhaps 

35 for this reason [8]. 
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^* Tii-«!tvle eroups for Improving 
Recently, the use of IR-style ^ro p ^ 

hypertext navigation has been proposed 16]. *>"^ 

saL issues exist for this application of groupings as 



well. 



It IS therefor, an object ot the present Invention to 
provide an Information navigation srste. uslM a 
clusterl.ed Information resource "P°l°«' 
tonolocy v,hlch provides the underplnnin. for a set of 
tfchn^rues t,at «a, allow f6r navigation of the entire 

. '"'irrdlng to one aspect of the present --ntlon there 
1, provided an infon^tlon navigation system, co^rlslng. 
infor-atlon resources having an infor-aticn 
ZZ^, in -hich each information 

With at least one term comhlnatlon «.d a set of IWRs. 
.here each term combination specifies ^^^^^^ 
describing each information resource and "'^ ""^^'"^^ 
information resources with matching term --^/"J 
for every existing term combination, a set of 
resources that contain said every existing term combination 
, form a cluster, where a cluster is defined as a set of 
lnfo«ation resources for which there exists at least one 
path between every pair of Information resources in said 
'set of information resources such that said " 
path contains only Information resources from 
information resources and a path is defined as a series 
information resources connected through llnKs; and 
information navigation means for navigating through the 
information resource topology. .„„„».„„ 
According to another aspect of the present invention 
there Is provided a method of Information navigation 
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through Information resources, comprising the steps of: 
forming an information resource topology among the 
information resources in which each information resource is 
associated with at least one term combination and a set of 
5 links, where each term combination specifies a set of terms 
describing each information resource and each link links 
information resources with matching term combinations, and 
for every existing term combination, a set of Information 
resources that contain said every existing term combination 

10 form a cluster, where a cluster Is defined as a set of 
Information resources for which there exists at least one 
path between every pair of . information resources in said 
set of information resources such that said at least one 
path contains only information resources from said set of 

15 Information resources and a path is defined as a series of 
Information resources connected through links: and 
navigating through the information resource topology. 

According to another aspect of the present Invention 
there is provided an article of manufacture, comprising: a 

20 computer usable medium having computer readable program 
code means embodied therein for causing a computer to 
function as an Information navigation system, the computer 
readable program means including: first computer readable 
program code means for causing the computer to form an 

25 infprmation resource topology among information resources 
in which each Information resource is associated with at 
least one term combination and a set of links, where each 
term combination specifies a set of terms describing each 
Information resource and each link links Information 

30 resources with matching term combinations, and for every 
existing term combination, a set of information resources 
that contain said every existing term combination form a 
cluster, where a cluster Is defined as a sat of information 
resources for which there exists at least one path between 

35 every pair of information resources in said set of 



-6- 



information resources such that said at least one path 
contains only information resources from said set of 
Information resources and a path Is defined as a series of 
information resources connected through links; and second 
5 computer readable program code means for causing the 
computer to navigate through the Information resource 
topology. 

According to another aspect of the present invention 
there Is provided an article of manufacture, comprising: a 

10 computer usable medium having computer readable program 
code means embodied therein for causing a computer to 
function as an information navigation system for navigating 
through an information resource topology among information 
resources in which each Information resource is associated 

15 with at least one term combination and a set of links, 
where each term combination specifies a set of terms 
describing each information resource and each link links 
information resources with matching term combinations, and 
for every existing term combination, a set of information 

20 resources that contain said every existing term combination 
form a cluster, where a cluster is defined as a set of 
information resources for which there exists at least one 
path between every pair of information resources in said 
set of information resources such that said at least one 

25 path contains only information resources from said set of. 
information resources and a path Is defined as a series of 
information resources connected through links, the computer 
readable program means including: first computer readable 
program code means for causing the computer to function as 

30 a link server for storing term combinations and links of 
the information resources, and answering queries about the 
stored term combinations by listing the links and the 
Information resources associated with those stored term 
combinations that fully match queried term combinations, so 

35 as to realize a gathering function to gather all 
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Information resources that contain a given term combination 
when at least one information resource containing said 
given term combination is known, by successively traversing 
the links from said at least one information resource 
5 containing said given term combination: second computer 
readable program code means for causing the computer to 
function as a search server for storing term combinations 
and links of the information resources, and answering 
queries about the stored term combinations by listing the 

10 links and the information resources associated with those 
stored term combinations that partially match queried term 
combinatioiis. so as to realize a searching function to 
search at least one information resource that contains a 
given term combination by successively searching clusters 

15 with an increasing number of terms matching with said given 
term combination; and third computer readable program code 
means for causing the computer to function as a topology 
manager to manage the information resource topology. 

According to another aspect of the present Invention 

20 there is provided an article of manufacture, comprising: a 
computer usable mediuin having computer readable program 
code means embodied therein for causing a computer to 
function as an information navigation system for navigating 
through an information resource topology among information 

25 resources In which each information resource is associated 
with at least one term combination and a set of links, 
where each term combination specifies a set of terms 
describing each Information resource and each link links 
Information resources with matching term combinations, and 

30 for every existing term combination, a set of information 
resources that contain said every existing term combination 
form a cluster, where a cluster is defined as a set of 
information resources for which there exists at least one 
path between every pair of information resources in said 

35 set of information resources such that . said at least one 



path contains only information resources from said set of 
information resources and a path is defined as a series of 
information resources connected through links, the computer 
readable program means including: first computer readable 
5 program code means for causing the computer to function as 
a gather client for sending queries about a term 
combination to one link server, and thereby learning fully 
matching Information resources and additional relevant link 
servers, so as to realize a gathering function to gather 

10 all information resources that contain a given term 
combination when at least one information resource 
containing said given term combination Is known, by 
successively traversing the links from said at least one 
information resource containing said, given term 

15 combination; and second computer readable program code 
means for causing the computer to function as .a search 
client for sending queries about a given term combination 
to one search server, and thereby learning partially 
matching Information resources and additional relevant 

20 search servers, so as to realize a searching function to 
search at least one Information resource that contains said 
given term combination by successively searching clusters 
with an Increasing number df terms matching with said given 
term combination. 

25 Other features and advantages of the present Invention 

will become apparent from the following description taken 
in conjunction with the accompanying drawings. 

The f±esent inventiiMj will riow be descifibed with refeirence 
tb the aetompanyihg df&vd.ngs, in vitilch: 

Fig. 1 is a diagram showing an example of resources, 
links and clusters In the information navigation system 
according to the present invention. 

Fig. 2 Is a diagram showing an example of a resource 
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entry table in the Information navigation system according 
to the present invention. 

Fig. 3 is a flow chart for an operation of a link 
server to gather the list of resource entry answers in the 
5 Information navigation system according to the present 
Invention. 

Fig. 4 is a flow chart for an operation of a gather 
client to gather all resources with a given term 
combination in the information navigation system according 
10 to the present invention. 

Fig. 5 is a diagram showing an example of a search 
procedure in the information navigation system according to 
the present Invention. 

Fig. 6 Is a flow chart for an operation of a search 
IS server to generate the list of resource entry answers In 
this information navigation system according to the present 
Invention. 

Fig. 7 Is a flow chart for an operation of a search 
client to search for a resource or minimum collection of 
20 resources with a given term combination in the information 
navigation system according to the present invention. 

Fig. 8 is a diagram showing an example of incorrect 
re-attachment at a time of deleting a resource in the 
information navigation system according to the present 
25 invention. 

Fig. 9 Is a flow chart for an operation of a resource 
addition sub-system to add a resource to the resource entry 
table in the information navigation system according to the 
present invention. 
30 Fig. 10 is a flow chart for an operation of a resource 

deletion sub-system to delete a resource from the resource 
entry table in the information navigation system according 
to the present invention. 

Fig. 11 is a flow chart for an operation of a link 
35 deletion sub-system to search for a resource or minimum 
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collection of resources with a given term combination and 
sequence number threshold In the Information navigation 
system according to the present Invention. 

Fig. 12 is a flow chart for an operation of a link 
deletion sub-system to delete a link from the resource 
entry table In the Information navigation system according 
to the present Invention. 

Fig. 13 Is a schematic block diagram of an exemplary 
networked computer systems for Implementing the information 
navigation system according to the present invention. 

Fig. 14 is a diagram showing an example of a message 
format used in the Information navigation system according 
to the present Invention. 

Fig. IS is a block diagram of an Internal 
configuration of an Ingrld server computer constituting the 
Information navigation system according to the present 
invention. 

Fig. 16 is a block diagram of an internal 
configuration of an Ingrld client computer constituting the 
information navigation system according to the present 
invention. 

Fig. 17 is a diagram showing an example of a link 
query message used in the information navigation system 
according to the present Invention. 

Fig. 18 Is a diagram showing an example of a link 
answer message used in the Information navigation system 
according to the present invention. 

Fig. 19 is a diagram showing an example of a search 
query message used in the information navigation system 
according to the present Invention. 

Fig. 20 is a diagram showing an example of a search 
answer message used in the information navigation system 
according to the present Invention. 
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DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 



Now, the preferred embodiments of the Information 
navigation system according to the present invention will 
5 be described In detail with references to the drawings. 

1. Overview of the Invention 

In the following, an information navigation system 
10 according to the present Invention will be referred to as 
Ingrid. There are four major components of Ingrid. They 

are: 

1. The Ingrid topology, which defines the 
organizational structure of resources. 
15 2. Gathering, which defines how to retrieve all, 

desired resources from the Ingrid Topology once at least 
one such resource is known. 

3. Searching, which defines how to find at least one 
desired resource. 
20 4. Distributed topology creation and maintenance. 

These features of Ingrid will be briefly outlined in 
this section, and described In greater detail In the 
following sections. 

25 Ingrid Topology 

Briefly stated, Ingrid allows resources to be placed 
in multiple (perhaps hundreds of) overlapping groups 
without requiring that any system maintain explicit 
information about the multiple groups. This allows 

30 resources to be distributed across multiple computer 

systems and searched using any combination of terms without 
requiring excessive memory in any given computer system. 

Ingrid does this by organizing the resources in each 
overlapping group into a sparsely-connected mesh network. 

35 This network has the special property that every group is a 
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topologlcally connected subgraph (called a cluster) In the 
network. 

For Instance, consider an example shown In Fls. 1. 
which shows a number of resources organized Into a 
5 sparsely-connected mesh network, and some of the clusters 
(A. B. and C) latent In that network organization. Each of 
the clusters is topologlcally connected in that a path 
exists from any resource in the cluster to any other 
resource without leaving the cluster. Fig. 1 also shows 
10 that the clusters can overlap (or, conversely, that a 
resource can belong to multiple clusters). For Instance, 
resource f belongs to both clusters B and C. 

The clusters are said to be "latent" in the network 
organization because no system needs to know the membership 
15 of the clusters. Instead, associated with each resource is 
a set of terms that describe the resource (for instance, 
author, title, and keywords). If one resource with the 
desired set of terms is known to a searcher, then all 
relevant resources can be efficiently found by simply 
20 traversing the links that lead to other resources with 
those terms. 

As a result, the full multi-dimensionality of 
resources can be captured without having to maintain in any 
one place explicit information about all resources or all 
25 clusters, or even all clusters a given resource belongs to 
or all resources a given cluster contains. 

Gathering 

Because of the cluster organization of Ingrid, if a 
30 single resource is known that contains the desired terms, 
all such resources can efficiently be found by simply 
successively traversing the links within the cluster (that 
is. those links that contain the desired terms). 

35 Searching 
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Resources that contain desired terms can be found In 
the Inffrld topology by successively searching clusters with 
more and more of the matchlncr terms. Each time an 
additional matching term is found, the scope of the search 
5 is greatly reduced. 

Creation and Maintenance 

For a new resource to Join the Ingrrld topology, new 
links are created between the new resource and one or more 

10 resources already In the Ingrld topology. No other links 
are added or deleted. To find the resources to which links 
should be added, a search is made for resources with as 
many as possible of the same terms as those describing the 
new resource. The new resource adds links to the smallest 

15 possible number of found resources such that, for every 
possible combination of terms In the new resource, it is 
connected to at least one other resource with those terms 
(or none, if no such resource exists). 

20 1.1 Requirements for Physical Inplementation of Ingrld 

Ingrld Is Implemented on networked computer systems. 
An example of two networked computer systems is shown In 
Fig. 13, where two ccmputers 1300 are networked through a 
25 computer network 1310. Each computer 1300 has a CPU 1301, a 
memory 1302 readable and writable by the CPU 1301, a 
communications port 1303 readable and writable by the CPU 
1301, and an identifier (not shown). 

The communications port. 1303 is connected to the 
30 computer network 1310 that allows any computer 

partlclparlng in an Ingrld topology to send and receive 
messages with any other computer participating in the same 
Ingrld topology. 

In general, each message is given in a message format 
35 1400 shown In Fig. 14. which contains at least the 
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following information: 

1. Destination Identifier 1401. This Identifier is 
used by the computer network to send the message to the 
identified computer; 

2. Source Identifier 1402. This identifier identifies 
the computer that sent the message. 

3. Destination Sub-system 1403. This field indicates 
which sub-system in the destination computer should process 
the message. 

4. Source Sub-system 1404. This field indicates the 
sub-system in the source computer that sent the message. 

5. Message Type 1405. This field Indicates what action 
should be taken. 

6. Message Body 1406. This field contains information 
relevant to the command. 

In general, the networked computer systems contain an 
Ingrld server computer and an Ingrld client computer. 

The Ingrld server computer has an internal 
configuration as shown In Fig. 15. where the ingrld server 
computer ISOO generally comprises a CPU 1501 and a memory 
1502. The memory 1502 has: a topology manager sub-system 
1510 containing a resource addition sub-system 1511. a 
resource deletion sub-system 1512, a link addition sub- 
system 1513, and a link deletion sub-system 1514; a link 
server sub-system 1520 containing a list of resource entry 
answers 1521; a search server sub-system 1530 containing a 
list of resource entry answers 1531; and a resource entry 
table 1540 connected with the topology manager sub -system 
1510. the link server sub-system 1520. and the search 
server sub-system 1530. 

The ingrid client computer has an Internal 
configuration as shown in Fig. 16. where the Ingrld server 
computer 1600 generally comprises a CPU 1601 and. a memory 
1602. The memory 1602 has: a gather client sub-system 1610 
containing a queried list 1611 and a found list 1612; a 
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search client sub-system 1620 containing a queried list 
1621 and a found list 1622: and a resources list 
1630 connected with the srather client sub-system 1610 and 
the search client sub-system 1620. 
5 Each element of these ingrrid server computer 1500 of 

Fig. 15 and inerid client computer 1600 of Fig. 16 will be 
described in grreat detail beiow. 

2. Ingrld Topology Definition 

10 

Assume a set of Information resources, or Just 
resources. Each resource is described by a set of one or 
more term combinations, where each term combination is 
itself a set. of terms. A resource R Is said to contain a 

15 given term combination TC if one of resource R's term 
combinations is identical to or a superset of the given 
term combination TC. 

Typical examples of terms in a resource's term 
combination are keywords, title words, and author names. 

20 The terms, however, are not limited to these. In 

particular, it is not necessary that the termsf are actually 
found in the resource itself. Nor Is it necessary that the 
resource itself be text, or even retrievable by a computer. 
It is only necessary that the Ingrld software know the 

25 terms that describe the resource, and has access to a 
description of how to obtain the resource. 

Associated with each resource is a set of zero or more 
pointers, each pointing to another resource. Each pointer 
allows for the retrieval of (1) the other resource, (2) the 

30 other resource's tern combinations, and (3) the other 
resource's pointers. Thus, one is able to move from 
resource to resource by following the pointers. Pointers 
are bidirectional, meaning that if resource A has a pointer 
to resource B. then resource B also has a pointer to 

35 resource A. Such a pair of pointers is called a link. 
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Define a path as a series of resources <Ri . Ra . Rs . 
— , R„> such that there exists links Ri -Ra . Ra -Rs , — . 
R„-i-Rn. where Rx-Rv denotes the link between R« and Rv • 
Define a cluster as a set of resources R for which there 
5 exists one or more paths between every pair of resources in 
R such that the paths contain only resources from set R, 

For instance, consider an example shown in Fig. 1 
again. Fig. 1 shows a set of resources a through k 
connected by various links. Fig. 1 also shows three 
10 clusters A. B. and C. A is a cluster because there exists a 
path between each of resources o. b, c, and d that includes 
only resources a, b, c. and d. 

Fig. 1 also gives an example of what isn't a cluster. 
Resources c. d, f, and fc do not form a cluster because, for 
15 instance, the oiily paths between d and f include either b 
or e. neither of which belong to the set c. d, f. and *. 

It is now possible to define the Ingrid topology as 
follows : 

20 Definition: An Ingrid topology Is a topology 

consisting of resources and their links, whereby for every 
existing term combination, the set of resources that 
contain that term combination form a cluster. 

25 Note that, while not explicitly a part of the above 

definition, it is intended that the clusters are sparsely 
connected. That is. each member of the cluster will only 
have a few links to other members of the cluster. This 
sparseness is one of the factors that contributes to the 

30 good scaling characteristics of the Ingrid topology. 

It is also Intended that the Ingrid topology is 
embodied across multiple physically-separate networked 
computer systems, up to and including the case where every 
resource's term combinations and associated links are 

35 stored on a separate physical computer. 
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Note in particular that a resource does not need to 
have associated with It any explicit Information about what 
clusters It belons:s to. nor does It require such 
Information to exist anywhere. As defined above, each 
resource has associated with It only Its term combinations 
and Its pointers. 

This lack of explicit information also contributes to 
good scaling:, especially In the case where there are a 
laree number of resources, each of which contains a lar^e 
number of terms or term combinations. By good scaling-. It 
is meant that thie amount of memory, CPU. or bandwidth 
required to maintain the Ingrld topology grows much slower 
than the number of resources, users, computers, and 
navigations.. These two characteristics allow a large amount 
of Information (that Is, which resources contain which term 
combinations) to be (indirectly) encoded with a relatively 
small number of links. As discussed in detail below, the 
indirectly encoded information can be efficiiently retrieved 
as needed by traversing the links of the Ingrld topology. 

2.1 Approximate Ingrld Topology 

In practice, it may be impractical or impossible to 
create an exact Ingrld topology, i.e.. one whereby every 
cluster contains all of the resources that contain the term 
combination associated with that cluster. Instead, it may 
be practical In practice to only approximate an Ingrld 
topology. That is. some percentage of resources that should 
belong to a given cluster may in fact not belong. 

An approximate Ingrld topology may be nearly as useful 
as an exact Ingrld topology. This is in part because 
resource searching is never an exact process even under the 
best of conditions. For Instance, the selection of keywords 
(by authors, the searching system, or the user) is always 
inexact. In addition, Ingrld searching software can 



compensate for the fact that the Ingrld topology itself may 
not be perfect, as will be described in detail below. 

In the following description, unless otherwise stated, 
any mention of the Ingrld topology is assumed to include 
5 the approximate Ingrid topology. 

3. Retrieving All Resources that Contain a Given Term 
Combination 

10 In many cases It is useful to be able to find all (or 

in the case of an approximate Ingrid topology, most) 
resources in an Ingrid topology that contain a certain term 
combination given that at least one such resource is known. 
An example of this is finding all resources with a given 
15 keyword. This function Is called gathering. 

Given that one resource in an Ingrid topology that 
contains a given term combination is known, it is possible 
to retrieve all resources that contain the term combination 
by following links to nearby resources that contain the 
20 term combination until all such resources are visited. 

Define the neighbor of a given resource to be all 
other resources with which the given resource shares a 
link. The way to gather all resources in a cluster is to 
successively retrieve neighbor resources and see if they 
25 contain the desired term combination. If they do, then the 
neighbors of that neighbor are retrieved and checked, and 
so on. All resources that contain the term combination are 
known to be found when no new neighbors of any found 
resource contain the term combination. 

30 

3.1 Embodiment of the Gathering Function 

The physical embodiment of the gathering function 
requires two different sub-systems. Each sub-system is 
35 implemented on a networked computer as described in section 
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1.1. One sub-system Is known as the link server, and Is 
used to (1) store the term combinations and links of 
resources, and (2) answer queries about those term 
combinations. This corresponds to the link server sub- 
system 1520 shown In Fig. 15. 

the other sub-system is known as the gather client, 
which gathers resources by making queries to link servers. 
This corresponds to the gather client sub-system 1610 shown 
in Fig. 16. 

3.1.1 Embodiment of the Link Server 

The link server Is embodied in a computer system as 
described in section 1.1. The memory of the computer system 
contains a resource entry table of one or more resource 
entries. Each resource entry contains: 

1. A pointer to a resource (which may or may not be 
stored on the same computer) . This pointer uniquely 
Identifies the resource among all resources. 

2. The term combinations associated with the resource. 

3. The sequence number of the resource. (The sequence 
number is only used for Ingrld topology creation and 
maintenance described below. ) 

4. The links associated with the resource. Each link 
contains : 

(a) The resource pointer of the remote resource. 

(b) The term combination associated with the link. 
Each link's term combination is a (perhaps proper) subset 
of one of the resource's term combinations. 

(c) The identifier of the computer containing the 
resource entry for the remote resource. 

An exemplary form of the resource entry table is shown 
In Fig. 2. 

The link server Is able to receive a link query 
message from another computer system (specifically, from a 
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gather client). The link query contains at least the term 
combination being gathered. More specifically, the link 
query message is given in a format shown in Fig. 17. where 
the link query message 1700 Includes a destination ID 1701 
5 as the destination identifier, a source ID 1702 as the 
source identifier, a link server sub-system 1703 as the 
destination sub-system, a gather client sub-system 1704. as 
the source sub-system, a link query 1705 as the message 
type, and a term combination 1706 as the message body. 
10 Upon receiving a link query, the link server sends a 

link answer message back to the client identified by the 
link query message. The link answer contains at least a 
list of resource entry answers. More specifically, the link 
answer message is given in a format shown in Fig. 18. where 
15 the link answer message 1800 Includes a destination ID 1801 
as the destination identifier, a source ID 1802 as the 
source identifier, a gather client sub-system 1803 as the 
destination sub-system, a link server sub-system 1804 as 
the source sub-system, a link answer 1805 as the message 
20 type, and a resource entry answer 1806 as the message body. 

The list of resource entry answers contains one entry 
for every resource entry term combination stored by the 
link server that fully matches (is identical to or is a 
superset of ) the term combination in the link query. Each 
25 resource entry answer contains at least: 

1. The resourcePointer of the entry containing the 
matching term combination. 

2. The links associated with the matching term 
combination. 

30 More specifically, the list of resource entry answers 

is generated according to the flow chart shown in Fig. 3, 
as follows. 

First, a link query message with a term combination 
TCln Is received (step 301). and the list of resource entry 
35 answers is cleared (step 302). 
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Then, for all resourceEntrles RE In the resource entry 
table, and for all termCombs TC in each resource Entry RE, 
the followinc steps 303 to 307 are carried out. 

Namely, whether each termComb TC is a full subset of 
5 the term combination TCin or not is Judged (step 303). and 
If so. a new list entry LEls created and added to the list 
of resource entry answers (step 304). and the 
resourcePolnter of the resourceEntry RE is added to the new 
entry LE (step 305). 

iO Then, for all links L in the resourceEntry RE, the 

following steps 306 and 307 are carried out. Namely, 
whether the termComb of the link L is a full subset of the 
term combination TCin or not is Judged (step 306), and if 
so, the link L is added to the new entry LE (step 307). 

15 After these steps 303 to 307 are completed for all 

termCombs TC in all resourceEntrles RE. a link answer 
message containing the list of resource entry answers so 
generated is sent to the client identified by the link 
query message with the term combination TCin (step 308). 

20 Note that, taken by itself, there is nothing novel 

about the functionality of a single link server. It is 
similar to the functionality of any search engine. It is 
the total combined functionality of all link servers spread 
over multiple networked computers that creates a working 

25 Ingrid topology. 

3.1.2 Embodiment of the Gather Client 

The gather client is embodied in a computer system as 
30 described In section 1.1. Initially, the gather client 
knows of one link server that contains a resource entry 
that matches the desired term combination. To gather all 
resources with the term combination, the gather client 
maintains the following three lists associated with the 
35 gathering operation: 
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1. The found list which contains the link servers not 
yet queried that are known to contain at least one fully 
matchin? resource. 

2. The queried list which contains the link servers 
5 that have already been queried. 

3. The resources. list which contains the matching 
resources. 

The operation according to the flow chart shown in 
Fig. 4 is executed by the gather client, as follows. 
10 First, a command to gather all resources with a term 

combination TCg is received (step 401). and the resources 
list is cleared (step 402). while the queried list is also 
cleared (step 403) and the link servers for resources known 
to contain the term combination TCg are put into the found 
15 list (step 404). 

Then, while the found list Is not empty, the following 
steps 405 to 411 are carried out. 

Namely, one link server LS is moved from the found 
list to the queried list (step 405), and a link query 
20 message containing the term combination TCg is sent to this 
link server LS (step 406). 

Next, for each link L in the list of resource entry 
answers received from the link server LS. the following 
steps 407 to 409 are carried out. Namely, whether the link 
25 server LSI of the link L is in the found list or not is 
Judged (step 407), and if not, whether the link server LSI 
of the link L is in the queried list or not is Judged (step 
408) If not. the link server LSI Is added to the found 
list (step 409) . 
30 Next, for each resource R in the list of resource 

entry einswers received from the link server LS, the 
following steps 410 and 411 are carried out. Namely, 
whether the resource R is In the resources list or not is 
Judged (step 410), and If not. the resource R is added to 
35 the resources list (step 411). 
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After these steps 405 to 411 are completed for all the 
link servers in the found list, the resources list now 
contains all resources in the Inirrld topology with the term 
combination TCg (step 412). 
5 Thus, when these steps are completed, all of the 

resources that contain the desired term combination will be 
listed in the resources list. 

Note that, in this method, a slnsrle link query covers 
all the resources stored in a link server, and therefore 

10 can return the links of multiple resources. It is the link 
server that determines whether or not a grlven resource 
contains a given term combination. The gather client is 
responsible for remembering which link serves have been 
queried, and for saving the results of queries. 

15 This Is a minor modification of the basic approach 

described above, where the client retrieves the term 
combinations and links for specific resources, and 
determines itself whether or not that resource's term 
combination matches, and therefore whether to retrieve the 

20 neighbors of that resource. That approach achieves the same 
overall functionality, but is less efficient than the 
method of Fig. 4. 

4. Searching the Ingrld Topology 

25 . 

The previous section described how to retrieve the 
full set of resources with a specified term combination 
given that one such resource is known. This section 
describes a technique for how to find at least one resource 

30 with a specified term combination. Or, if no resource with 
the term combination exists, then this technique finds the 
set of resources that, taken together, contain as many of 
the term sub-combinations of the specified term combination 
as possible. 

35 Define a term sub-:combinatlon of a given term 
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combination TC as a term combination that contains some or 
all of the terms of TC. For example, the term sub- 
combinations of the term combination <ABC> (where A. B. C 
are separate terms) are: <A>. <B> . <C>. <AB>. <AC>. <BC>. 
and <ABC>. so. for instance, if the term combination <ABC> 
is being searched, and no resource contains <ABC>. but 
three resources contain <AB>. <AC>. and <BC> respectively, 
then the search technique described here will find all 
three of these resources. 

Note for instance that if an additional resource with 
term combination <ADE> exists, that term combination will 
not necessarily be found by the search technique. The 
reason is that <A> is redundant given the existence of <AB> 
and <AC>. (And furthermore. <DE> is not relevant to the 
search, and so is ignored.) 

The term combination being searched is referred to as 
• the search terms. In what follows, it is assumed that the 
searching system initially knows of at least one resource 
for each of the search terms. For example, if the search 
\ terms are <ABC>. the searching system will initially know 
of at least one resource with <A>, one resource with <B>. 
and one resource with <C>. (For completeness, a known 
method of efficiently finding the required initial 
resources is given in section 4.1.4 below.) Note that when 
5 a searching system is said to "know of a resource" . the 
implication is that the searching system also knows what 
computer contains the resource entry table for that 
resource. 

The basic mechanism used for searching is that of 
0 referral. There is a search client that is responsible for 
managing the search. There are multiple search servers. 
These search servers contain pointers to resources, their 
terms, and their links. The search servers also contain the 
terms of the resources the links point to. If a search 
5 server contains a resource with a given term combination. 



the search server Is said to be In the cluster associated 
with the term combination. 

The search client queries some search server with the 
search terms. The search server refers the search client to 
5 other search servers whose resources contain the search 
terms. This continues either until the desired resource is 
found or the search client determines that the desired 
resource does not exist or Is too difficult to find. 
The basic approach for searching Insrld Is to 
10 successively find search servers with more and more 

matching terms until a search server Is found with all the 
terms. This process is illustrated in Fig. 5, which shows 5 
clusters as 5 circles, for terms A, B, C, D, and E. The 
overlap of the circles represent clusters for term 
15 combinations with more than one term. 

The search client explores each of the clusters with 
one term until It finds one or more search servers that 
contain resources with two matching terms. This searching 
process is denoted in Fig. 5 by the arrows. The search 
20 client is shown finding clusters for term combinations 
<AB>. <AD>, <BC>. <CE>. and <DE>. At this point, any 
continued searching of clusters with one term is (perhaps 
temporarily) discontinued (as Indicated by the solid line 
across the arrow tip) . This Is because searching a cluster 
25 with two terms Is, In most cases, more likely to yield good 
results than searching a cluster with one term. (The 
exception being, for Instance, a cluster for a single rare 
term versus a cluster for two common terms.) 

Fig. 5 shows searching branching out within the 
30 clusters with two terms until two clusters with three terms 
are found, <ABD> and <BCE>. Any further searching of 
clusters with two terms is halted, and the clusters with 
three terms are searched. Next, Fig. 5 shows that a cluster 
with four terms <ABCE> is found. This cluster Is explored 
35 until the cluster with all five terms Is found. This ends 
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the search. 

In the example of Fig. 5, the search was able to 
continually find better clusters (clusters with more 
matching terms) until a full match was found. Each time a 
5 better cluster was found, exploration of poorer clusters 
was discontinued. In general, however, it may happen that a 
better cluster is completely explored without finding still 
better clusters. In this case, the search client can resume 
searching of the poorer clusters. This process continues 
10 until either some predetermined limits have been exhausted 
(such as the total number of queries made or the total time 
spent) . or until all possible clusters have been fully 
explored. 

Note also that Fig. S depicts a fully correct Ingrid 
IS topology. In the case of an approximate Ingrid topology, it 
may be that a cluster being explored is actually 
partitioned. In this ca«e, one partition of the cluster may 
be fully explored without in fact checking all of the 
resources that contain the term combination of the cluster. 
20 If the search client suspects that this is the case 
(because, for Instance, the number of resources in a 
cluster is smaller than what might be expected) , then It 
may continue searching larger clusters in order to find the 
other partitions. 
25 If the search client does discover a partitioned 

cluster, it could, as an optimization, notify the 
appropriate Ingrid topology creation and maintenance sub- 
systems, which could in turn repair the partition. 



30 4.1 Embodiment of the Searching Function 



Similarly as in a case of the gathering function, the 
physical embodiment of the searching function requires two 
different sub-systems, which in this case are the search 
35 server corresponding to the search server sub-system 1530 
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shown In Fig. 15 and the search client corresponding to the 
search client sub-system 1620 shown In Fig. 16. 

4.1.1 Embodiment of the Search Server 

5 

The search server Is similar to the link server of 
section 3.1.1 In a number of ways. First, the search server 
and link server share the same resource entry table. 
Second; the search server Is able to receive a query 

10 message and return an answer- In the case of the search 
server, the query message Is called the search query. 
However, the contents of the search query message body are 
identical to those of the link query. More specifically, 
the search query message Is given in a format shown in Fig. 

15 19. where the search query message 1900 includes a 

destination ID 1901 as the destination Identifier, a source 
ID 1902 as the source identifier, a search server sub- 
system 1903 as the destination sub-system, a search client 
sub-system 1904 as the source sub-system, a search query 

20 1905 as the message type, and a term combination 1906 as 
the message body. 

The answer returned by the search query Is similar to 
that of the link query In that the answer contains a list 
of resource entry answers. However, there are two 

25 significant differences. First, rather than simply include 
resources that fully match, the search server returns a 
list of resources that match fully or partially. In other 
words, as long as the resource in the memory of the. search 
server matches at least, one term, it is returned in the 

30 search answer. Second, in addition to including the 

resourcePointer and list of links in each resource entry 
answer, the search server also returns the terms that 
matched. Thus, the search client is able to determine which 
clusters the resources In the resource entry answer belong 

35 to. More specifically, the search answer message is given 
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in a format shown in Fiff. 20. where the search answer 
message 2000 Includes a destination ID 2001 as the 
destination identifier, a source ID 2002 as the source 
identifier, a search client sub-system 2003 as the 
5 destination sub-system, a search server sub-system 2004 as 
the source sub-system, a search answer 2005 as the message 
type, and a resource entry answer 2006 as the message body. 

The operation according to the flow chart shown In 
Fig. 6 is executed by the search server in response to a 

10 search query, as follows. 

First, a search query message with a term combination 
TCm Is received (step 601). and the list of resource entry 
answers is cleared (step 602). 

Then, for all resourceEntries RE In the resource entry 

;i5 table, and for all termCombs TC in each resource Entry RE. 
the following steps 603 to 609 are carried out. 

Namely, whether each termComb TC and the term 
combination TCin partially match (i.e. , share any terms) or 
not is Judged (step 603), and if so. a new list entry LE is 

20 created and added to the list of resource entry , answers 
(step 604). the resourcePointer of the resourceEntry RE is 
added to the new entry LE (step 605). and a termComb 
consisting of terms common to both the termComb TC and the 
terra combination TCln is added to the new entry LE (step 

25 606) . 

Then, for all links L in the resourceEntry RE, the 
following steps 607 to 609 are carried out. Namely, whether 
any of the terms of the link L are contained in the term 
combination TCin or not is Judged (step 607). and if so, 
30 the link L is added to the new entry LE (step 608). and a 
termComb consisting of terms common to the link L and the 
terra combination TCln is added to the new entry LE (step 
609) . 

After these steps 603 to 609 are completed for all 
35 termCombs TC In all resourceEntries RE, a. search answer 
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message containing the list of resource entry answers so 
generated is sent to the client identified by the search 
query message with the term combination TCin (step 610). 
Because the differences between the search server and 
5 the link server are minor, and particularly because the 
identical memory table is used in both, the two sub-systems 
would normally be implemented on the same computer, as in 
the ingrid server computer 1500 of Fig.. 15. 

10 4.1.2 Embodiment of the Search Client 

Initially, for each term in the desired term 
combination, the search client knows of one search server 
that contains a resource entry that matches that term. Like 

15 the gather server of section 3.1.2. the search server 

maintains three lists during the search process. However, 
the contents of these lists are somewhat different, 
reflecting the more complex process of searching as 
compared to gathering, as follows. 

20 1. The found list contains the search servers that are 

known to contain at least one partially matching resource, 
but that have not yet been queried. Associated with each 
search server In the found list is the best term 
combination known for that search server. By best term 

25 combination it is meant the term combination with the most 
terms matching that of the desired term combination. 

2. The queried list contains the search servers that 
have already been queried. 

3. The resources list contains the found partially 

30 matching resources. However, no resources in the resources 
list may contain a term combination that is a subset of or 
identical to any other resource in the resources list. 

The operation according to the flow chart shown in 
Fig. 7 is executed by the search client, as follows. 

35 First, a command to search for resources with a term 
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combination TCg is received (step 701), and the resources 
list and the queried list are cleared (step 702). while the 
search servers known to contain resources with terms from 
the term combination TCg are put Into the found list (step 
5 703). 

Then, while the found list is not empty and no 
resource R in the resource list has a term combination that 
fully matches TCg, the following steps 704 to 716 are 
carried out. 

10 Namely, the search server SS with the most terms is 

selected from the found list (step 704). and this search 
server SS is moved from the found list to the queried list 
(step 705). Then, a search query message containing the 
term combination TCg is sent to this search server SS (step 

15 706). 

Next, for each link L (containing a search server SSI 
and a term combination Tl) in the list of resource entry 
answers received from the search server SS. the following 
steps 707 to 711 are carried out. Namely, whether the 
20 search server SSI of the link L is in either the found list 
or the queried list, or not. is Judged (step 707). and If 
not. the search server SSI and the term combination Tl are 
added to the found list (step 708). Otherwise, whether the 
search server SSI of the link L Is in the found list or not 
25 is Judged (step 709). and If so. whether the term 

combination Tl contains" more terms than that of the found 
list entry or not is Judged (step 710). If so. the terms 
for the search server SSI in the found list are replaced 
with the term combination Tl (step 711). 
30 Next, for each resource R with the term combination T 

in the list of resource entry answers received from the 
search server SS. the following steps 712 and 716 are 
carried out. Namely, whether the resource R Is in the 
resources list or not Is Judged (step 712). If not. for 
35 each resource Rl with the term combination Tl in the 
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resources list, the following steps 713 and 714 are carried 
out. That Is, whether the term combination Tl Is a subset 
of or Identical to the term combination T. or not, is 
Judged (step 713), and if so. the resource Rl is removed 

i from the resources list (step 714). Then, considering each 
resource Ri with the term combination Ti in the resources 
list, the following steps 715 and 716 are carried out. That 
is, whether none of the term combinations Tl are a superset 
of the term combination T or not is Judged (step 715). and 

I if so. the resource R is added to the resources list (step 
716). 

After these steps 704 to 716 are completed, the 
resources list now contains the set of found resources 
(step 717). 

Thus, when these steps are completed, the resources in 
the resources list represent the set of resources that, 
taken together, most fully match the various possible term 
sub-combinations of the desired term combination. In the 
best case, the resources list will contain only one 
resource, that is. an exact match with the desired term 
combination. 

Specifically, disregarding all terms not in the search 
terms, this search finds a set of resources such that: 

1. No resource in the resources list has term 
combinations that are all subsets of those of other 
resources in the resources list. 

2. There are no resources not in the resources list 
that have term combinations that are not subsets of those 
In the resources list. 

Put another way, for every possible term sub- 
combination that can be enumerated from the searched term 
combination, the search will find at least one instance of 
that term sub-combination if one exists in the Ingrid 
topology. Further, every resource returned will contain at 
least one term sub-combination not contained in any other 



found resource. 



4.1.3 Additional Search Mechanisms 

5 It is almost certain that, in practice, search servers 

will have additional Information than that described above 
for the purpose of answering search queries. This is 
because, even though the search described here has the 
desirable effect of continuously narrowing the search 

10 scope, it may by itself have Inadequate efficiency in many 
cases. By storing certain additional Information, a search 
server may greatly improve search efficiency. 

The most likely method for obtaining additional 
information is simple caching. For instance, after 

15 executing a search, the search client can inform previously 
queried search servers of the resources found by the 
search. By saving this information, the search servers can 
make subsequent similar searches more efficient. 

In addition, search clients themselves may use 

20 additional information to manage the search process. For 
instance, in the description here, a search client randomly 
explores a given cluster in search of a better cluster. 
Rather than explore randomly, the search client may use 
additional information to better guide its search. For 

25 Instance, if the search client knows of synonyms for the 
search terms, the search client could Include the synonyms 
in its search queries and favor neighbor resources that 
contain the synonyms. 

30 4.1.4 Finding Resources with a Single Term 

The above technique for searching the Ingrid topology 
requires that at least one cluster is known for each term 
in the search terms. Strictly speaking, this is not 
35 necessary, in the sense that a search client could 
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theoretically randomly search the Ingrid topoloCT' from any 
starting point. However, finding a cluster with even one 
term using this approach would be prohibitively 
inefficient. 

5 An efficient method for finding the initial cluster is 

to use a search database consisting of entries for each 
Individual term in Ingrld space. In other words, each entry 
is indexable by a single term. Each entry contains one or 
at most a ' small number of pointers to search servers that 

10 contain resources that contain that individual term. This 
database can be queried at the beginning of a search to 
find the initial clusters. 

This approach scales well for a nxunber of reasons. 
First, the database size scales only by the number of terms 

15 in Ingr id space (as opposed to the number of term 

combinations). This number (perhaps several million} is 
manageable with current technology. Second, the number of 
queries to each database can be kept adequately small. This 
is because the database can be replicated so that each 

20 database receives only a fraction of the queries. In 

addition, search clients can save the results of previous 
searches . and so in many instances, will already know of 
starting clusters at the beginning of a new search. 
Techniques for creating and maintaining such a 

25 replicated database are well-known. 

5. Creating an Ingrld Topology 

The previous sections described the Ingrld topology. 

30 how to efficiently search the Ingrld topology, and how to 
efficiently gather resources from the Ingrld topology once 
at least one appropriate resource has been found. This 
section describes a technique for creating an Ingrld 
topology in a fully distributed system. This section 

35 completes the necessary components required for a fully 
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functional Ingrid Information Navigation System. 

The Ingrld topology Is created incrementally. When a 
new resource Is added, no existing links are modified 
(deleted or attached to different resources). The only new 
links are those from, the new resource to already existing 
resources. When an existing resource is deleted, only those 
links between the deleted resource and other resources are 
deleted. New links may be added, however, to repair any 
cluster partitions that may have occurred because of the 
deletion. 

To maintain an Ingrid topology, one additional piece 
of information is associated with every resource. This is a 
single sequence number. 

The basic technique for adding a new resource to the 
Ingrid topology is to: 

1. Execute a search for each term combination in the 
new resource. 

2. Add forward links from the new resource to the 
resource's found. 

3. Add backward links from the found resources to the 
new resource. 

4. Set the sequence number of the new resource to be 
one greater than the highest sequence number of the found 
resources . 

The search executed must be one. that returns the same 
results as that described in section 4. Note that the 
forward and backward links (elsewhere called pointers) form 
a single link between two resources. 

When links are added to this set of resources, the new 
resource will be attached to every cluster relevant to the 
new resource's terms. The new resource, however, will have 
added a minimum number of links, thus keeping the topology 
sparse.. 

Note that a variation on the above scheme allows two 
or three of each term sub -combination to be found by the 



search instead of Just one. This is a trivial enhancement 
to the above search technique, whereby the search client 
gathers one or two additional resources for each cluster in 
addition to the one found by the search. This variation 
5 will result in more robust clusters in the sense that the 
cluster will be harder to partition. It may also reduce the 
time required for gathering, as the resulting cluster will 
have a smaller diameter. However, it results in more links, 
and therefore greater overhead. 

10 The basic technique for deleting a resource in the 

Ingrid topology is as follows. Any given resource will 
potentially have both forward links (those attached to 
other resources when the resource was added) and backward 
links (those attached when. other resources were added). 

15 When a resource Is deleted, the resources with backward 
.. links to the deleted resource simply delete the links and 
do nothing else. The resources that had forward links to 
the deleted resource must re-search in order to re-attach 
to the clusters that they were partitioned from. 

20 The re-searching must only find resources with lower 

sequence numbers than that of the deleted resource. This is 
necessary to insure that the cluster does not remain 
partitioned after the new links are added. Fig. 8 
illustrates an exemplary case in which this is done 

25 incorrectly, resulting in a failure to repair the partition 
produced by a deletion of one resource, which explains why 
this is necessary. 

When a resource is deleted, the cluster it belonged to 
may be partitioned into two. Of these two partitions, all 

30 of the resources in one partition will have sequences lower 
than that of the deleted resource, and all of the resources 
in the other partition will have higher sequence numbers. 

The resources responsible for repairing the partition 
are those that formerly had forward links to the deleted 

35 resource. Of these, one or more of then will have the 
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lowest . sequence number of all those in the higher 
partition. If these resources re-attach to resources with 
lower sequence numbers, the partition will be repaired. 
Otherwise, the partition will remain. 
5 As an optimization to the re-search, before the 

deleted resource is deleted. It can Inform its backward 
link neighbors of its forward link neighbors. The backward 
link neighbors can then use the informed forward link 
neighbors as starting points for the subsequent search. 
10 thus making the overall search more efficient. 

5.1 Embodiment of Inerld Topology Creation 

Since Ingrid topology creation is done by 
15 incrementally adding and deleting resources as described 
above, the embodiment of the overall Ingrid topology 
creation process is nothing more than the embodiment of the 
resource adding and deleting functions. 

To embody these resource adding and deleting 
20 functions, the following four sub-systems are required: 

1. Resource addition sub-system 

2. Resource deletion sub-system 

3. Link addition sub-system 

4. Link deletion sub-system 

25 The resource addition and resource deletion sub- 

systems are executed on the computer that contains the 
added or deleted resource. The link addition and link 
deletion sub-systems are executed on the computers that 
contain the neighbors of the added or deleted resource. 

30 All of these sub-systems operate on the resource entry 

table. Thus, it is assumed that the four sub-systems 
operate on the same computer. The four sub- systems are 
collectively called the topology manager sub-system. 

It is also assumed that the topology manager sub- 

35 system either (1) runs on the same computers as the link 
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and search servers, or (2) can remotely update the resource 
entry tables on the link and search servers to reflect any 

new state. 

5 5.1.1 Embodiment of the Resource Addition Sub-System 

The resource addition sub-system initially Is erlven a 
resource with the following Information: 
1. A pointer to the resource 
10 2. One or more term combinations 

For each term combination, the resource addition sub- 
system executes the same search as the search client, as 
shown in Flgr. 7 described above. The resource addition sub- 
system then sends a link add message to each of the 
IS resources found by the search to add a link. The link add 
message contains the largest term combination common 
between the searched term combination and the relevant term 
combination of the resource found by the search. 

More specifically, this process is realized by the 
20 operation according to the flow chart shown in Fig. 9. as 
follows . 

First, a command to add a resource containing a 
resourcePointer and a list of term combinations is received 
(step 901) , and a new resourceEntry RE for the resource 

25 entry table Is created (Step 902). then, the 

resourcePointer is added to the new resourceEntry RE (step 
903) , and the term combinations are added to the new 
resourceEntry RE (step 904). 

Then. for. each term combination TC in the list of term 

30 combinations, the following steps 90S to 908 are carried 
out. Namely, a search for the term combination TC Is 
executed such that this search returns a list of resources 
(step 905). Then, for each resource R in the list of 
resources returned by the search, the following steps 906 

35 to 908 are carried out. That Is, the link for the resource 
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R is added to the new resourceEntry RE (step 906) , a term 
combination T containlnff the common terms between the term 
combination TC and the relevant term combination of the 
resource R Is created (step 907) . and a link add message 
5 containing the term combination T is sent- to the topology 
manager sub-system of the resource R (step 908). 

After these steps 905 to 908 are completed for all the 
term combinations in the list of term combinations, the 
sequence number of the new resourceEntry RE is set to be 
10 the largest of those in the list of resources (step 909). 

5.1.2 Embodiment of the Resource Deletion Sub-System 

The resource deletion sub-system Initially is given a 
15 resource in its resource entry table to delete. The 

resource deletion sub-system sends a link delete message to 
the link deletion sub-systems of each of the links 
associated with that resource. The link delete message 
carries the term combination of the resource being deleted 
20 and the resource pointers for the resources at either end 
of the link. The link deletion sub-system then deletes the 
resource entry from the resource entry table. 

More specifically, this process is realized by the 
operation according to the flow chart shown in Fig. 10. as 
25 follows. 

First , a command to delete a resourceEntry RE from the 
resource entry table Is received (step 1001). Then, for 
each link L with the resource R in the resourceEntry RE, a 
link delete message is sent to the topology manager sub- 
30 system for the link L (step 1002). 

Then, the resourceEntry RE is deleted from the 
resource entry table (step 1003). 

5.1,3 Embodiment of the Link Addition Sub-System 

35 
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The link addition sub-system begins execution when It 
receives a link add messag-e. The link addition sub-system 
finds the entry in the resource entry table matching the 
resource information In the received link add message. The 
link addition sub-system then adds the link to the found 
entry In the resource entry table. 

5.1.4 Embodiment of the Link Deletion Sub-System 

The link deletion sub-system begins execution when it 
receives a link delete message. 

If the link specified in the link delete message is 
for a backward link, the link deletion sub-system simply 
deletes the entry for the link in the resource entry table. 

If the link specified in the link delete message is 
for a forward link, however, the link deletion sub-system 
must then execute a search for the terms associated with 
that link. This search is almost Identical to that shown in 
Fig. 7 described above. The difference is that only those 
resources with a sequence number lower than that of the 
resource for which the link was deleted are considered. 

More specifically, this modified search is executed 
according to the flow chart of Fig. li. as follows. 

First, a command to search for resources with a term 
combination TCg and a sequence number smaller than seqN Is 
received. (step 1101). and the resources list and the 
queried list are cleared (step 1102). while the search 
servers known to contain resources with the term 
combination TCg are put into the found list (step 1103). 

Then, while the found list is not empty and no 
resource R in the resource list has a term combination that 
fully matches TCg, the following steps 1104 to 1117 are 
carried out. 

Namely, the search server SS with the most terms is 
selected from the found list (step 1104), and this search 
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server SS Is moved from the found list to the queried list 
(step 1105). Then, a search query message containing the 
term combination TCg Is sent to this search server SS (step 
1106). 

5 Next, for each link L (containing a search server SSI. 

a term combination Tl. and a sequence number SNl) In the 
list of resource entry answers received from the search 
server SS. the following steps 1107 to 1112 are carried 
out. Namely, whether the sequence number SNl is less than 
10 seqN or not is Judged {step 1107). If so. whether the . . ■ 
search server SSl of the link L is in either the found list 
or the queried list, or not. Is Judged (step 1108). and If 
not. the search server SSI and the term combination Tl are 
added to the found list (step 1109). Otherwise, whether the 
15 search server SSl of the link L Is In the found list or not 
Is Judged (step 1110). and If so. whether the term 
combination Tl contains more terms than, that of the found 
list entry or not Is Judged (step 1111). If so. the terms 
for the search server SSl In the found list entry are 
20 replaced with the term combination Tl (step 1112). 

Next, for each Resource R with the term combination T 
in the list of resource entry answers received from the 
search server SS. the following steps 1113 and 1117 are 
carried out. Namely, whether the resource R Is In the 
25 resources list or not Is Judged (step 1113). If not. for 
each resource Ri with the term combination Tl In the 
resources list, the following steps 1114 and 1115 are 
carried out. That is. whether the term combination Tl is a 
subset of or identical to the term combination T. or not, 
30 Is Judged (step 1114). and if so, the resource Ri Is 
removed from the resources list (step 1115). Then, 
considering each resource RI with the terms Tl In the 
resources list, the following steps 1116 and 1117 are 
carried out. That is. whether none of the term combinations 
35 Tl are a superset of the term combination T or not is 
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Judgred (step 1116), and If so, the resource R Is added to 
the resources list (step 1117). 

After these steps 1104 to. 1117 are completed, the 
resources list now contains the set of fotuid resources 
5 (step 1118). 

The forward link Is then deleted, and any resources 
found by the modified search are added as new links. 

More specifically, this process is realized by the 
operation according to the flow chart shown in Fig. 12, as 
10 follows. 

First, a link delete message for a resource R and a 
link L is received (step 1201) , and the resource R and the 
link L are found In the resource entry table (step 1202). 
Then, whether the link L is a forward link or not is Judged' 

15 (step 1203). If the link L is the forward link, then the 
following steps 1204 to 1208 are carried out. 

Namely, assuming TC is the term combination of the 
link being deleted (step 1204) , a modified search for the 
term combination TC using the sequence number of the 

20 resource R is executed, such that this modified search 
returns a list of resources (step 1205). Then, for each 
resource Rr in the list of resources returned from the 
modified search, the following steps 1206 to 1208 are 
carried out. That is, the link for the resource Rr is added 

25 to the resourceEntry RE (step 1206), a term combination T 
containing the common terms between the term combination TC 
and the term combinations of the resource Rr is created 
(step 1207), and a link add message . containing the term 
combination T Is sent to the topology manager sub-system of 

30 the resource Rr (step 1208). 

After these steps 1204 to 1208 are completed for the 
link L which is the forward link, or when the link L is not 
the forward link, the link L is deleted from the resource R 
(step 1209), and the sequence number of the resource R is 

35 set to be the highest of the modified set of links plus one 
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(step 1210). 

6. Software Implementation of the Invention 

5 It Is to be noted that the above described embodiments 

according to the present Invention may be conveniently 
implemented using conventional general purpose digital 
computers programmed according to the teachings of the 
present specification, as will be apparent to those skilled 
10 in the computer art. Appropriate software coding can 

readily be prepared by skilled programmers based on the 
teachings of the present disclosure, as will be apparent to 
those skilled in the software art. 

For instance, any desired combination of the link 

15 server and the gather client of the Ingrld gathering 

function, the search server and the search client of the 
Ingrld searching function, and the topology manager sub- 
system of the Ingrld creation and maintenance function 
described above can be conveniently Implemented into a 

20. software package. 

In particular, the memory content in the Ingrld server 
computer shown in Fig. 15 can be conveniently implemented 
into a software package. This Ingrid server computer of 
Fig. 15 represents the system that would be used by the 

25 provider of an information resource. The provider would 
presumably have some information resources that could be 
retrieved by currently available methods such as that using 
a Netscape Navigator [4], for instance. The information 
provider would use the resource addition and deletion sub- 

30 systems to add/delete each of his resources to/from the 
Ingrld topology. The link server and search server sub- 
systems would be used to help Ingrid client systems search 
for and gather relevant information resources . Note that 
the resource addition and deletion require the use of the 

35 search client sub-system shown in Fig. 16. as described 
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above . 

Also, the memory content in the Inirrid client computer 
shown In Flff. 16 can be conveniently Implemented into a 
software package. This Ingrid client computer of Fig. 16 
5 represents the system that would be used by a searcher of 
an Information resource. This system would presumably be 
attached to some kind of application software which would 
provide the interface between the user and Ingrld. 
Different versions of the application software could vary 
iO widely, for Instance an application used as a local yellow 
pages service versus an application used to find scientific 
papers . However , the basic underlying functionality used . 
for navigating the Ingrid topology by the system of Fig. 16 . 
is the same. 

15 Such a software package can be a computer program 

product which employs a storage medium Including stored 
computer code which is used to program a computer to 
perform the disclosed function and process of the present 
invention. The storage medium may include, but is not 

20 limited to, any type of conventional floppy discs, optical 
discs. CD-ROMs, magneto-optical discs. ROMs. RAMs, EPROMs, 
EEPROMs, magnetic or optical cards, or any other suitable 
media for storing electronic instructions. 

As should be apparent to those skilled in the art, the 

25 information navigation system called Ingrid as described 
above is particularly effective in realizing (1) a search 
for desired resources among Internet resources, and (2) 
topic-based browsing of Internet resources. 

It is to be noted that, besides those already 

30 mentioned above, many modifications and variations of the 
above embodiments may be made without departing from the 
novel and advantageous features of the present invention. 
Accordingly, all such modifications and variations are 
Intended to be included within the scope of the appended 

35 claims. 
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CLAIMS: 



1. An information navigation system, comprising: 

information resources havlnc an information resource 
5 topology in which each information resource is associated 
with at least one term . combination and a set of links, 
where each term combination specifies a set of terms 
describing each information resource and each link links 
information resources with matching term combinations, and 
10 for every existing term combination, a set of Information 
resources that contain said every existing term combination 
form a cluster, where a cluster is defined as a set of 
information resources for which there exists at least one 
path between every pair of information resources in said 
15 set of Information resources such that said at least one 
. path contains only Information resources from said set of 
Information resources and a path is defined as a series of 
information resources connected through links; and 

information navigation means for navigating through 
20 the information resource topology. 

2. The information navigation system of claim 1. wherein 
each cluster is sparsely connected. 

25 3. The information navigation system of claim 1. wherein 
the information resource topology is formed across multiple 
networked computer systems. 

4. The information navigation system of claim. 1. wherein 
30 the information navigation means is provided in each one of 

multiple networked computer . systems . 

5. The information navigation system of claim 1. wherein 
the set of terms describing each information resource are 

35 given by any of keywords, title.words. and author names 



related to each Information resource. 

6. The Inforoatlon navlgratlon system of claim 1, wherein 
the Information resource topology Is an approximated 
. 5 Information resource topology in which not every cluster 
contains all of the Information' resources that contain the 
term combination associated with the Information resources 
belonging to said every cluster. 

10 7. The Information navigation system of claim 1, wherein 
the information navigation means includes gathering means 
for gathering all Information resources that contain a 
given term combination when at least one Information 
resource containing said given term combination Is known. 

15 by successively traversing the links from said at least one 
information resource containing said given term 
combination. 

8. The information navigation system of claim 7, wherein 
20 the gathering means includes: 

link servers, each for storing term combinations and 
links of the Information resources, and answering queries 
about the stored term combinations by listing the links and 
the Information resources associated with those stored term 
25 combinations that fully match queried term combinations: . 
and 

a gather client for sending queries about a term 
combination to one link server, and thereby learning fully 
matching information resources and any additional relevant 
30 link servers which can be subsequently queried. 

9. The Information navigation system of claim 1, wherein 
the information navigation means Includes searching means 
for searching at least one Information resource that 

35 contains a given term combination by successively searching 
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clusters with an increasing number of terms matching with 
said given term combination. 

10. The information navigation system of claim 9. wherein 
5 when no information resource with the term combination 

matching with said given term combination Is found, the 
searching means finds a set of information resources that 
collectively contain as many of term sub -combinations of 
said given term combination as possible. 

10 

11. The information navigation system of claim 9. wherein 
the searching means Includes: 

search servers, each for storing term combinations and 
links of the information resources, and answering queries 

15 about the stored term combinations by listing the links and 
the information resources associated with those stored term 
combinations that partially match queried term 
combinations: and 

a search client for sending queries about said given 

20 term combination to one search server, and thereby learning 
partially matching Information resources and any additional 
relevant search servers which can be subsequently queried. 

12. The information navigation system of claim 1. wherein 
25 the information navigation means include topology manager 

for creating and maintaining the information resource 
topology.. 

13. The information navigation system of claim 12, wherein 
30 the topology manager includes: 

resource addition means for adding a new informatipn 
resource to the information resource topology, by searching 
the information resource topology for information resources 
with term combinations or sub-combinations that match those 
35 of the new Information resource and Issuing link add 
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messages to matching information resources found by the 
searching. 

14. The information navigation system of claim 13, wherein 
5 the topology manager further Includes: 

link addition means for adding links from the new 
information resource to the matching information resources 
Informed by link add messages from the resource addition 
means. 

10 

15. The Information navigation system of claim 12, wherein 
the topology manager Includes: 

resource deletion means for deleting an information . 
resource from the information resource topology by issuing 
IS link delete messages to those information resources which 
are linked to the deleted information resource. 

16. The Information navigation system of claim 15.- wherein 
the topology manager further Includes: 

20 link deletion means for deleting links to the 

deleted information resource Informed by link delete 
messages from the resource deletion means, by re-searching 
the information resource topology for information resources 
with term sub-combinations that match those of the deleted 

25 information resource. 

17. A methpd of information navigation through information 
resources, comprlising the steps of: 

forming an Information resource topology among the 
30 information resources in which each Information resource is 
associated with at least one term combination and a set of 
links, where each term combination specifies a set of terms 
describing each Information resource and each -link links 
information resources with matching term combinations, and 
35 for every existing term combination, a set of Information 
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resources that contain said every existing term combination 
form a cluster, where a cluster Is defined as a set of 
information resources for which there exists at least one 
path between every pair of Information resources In said 
5 set of Information resources such that said at least one 
path contains only Information resources from said set of 
information resources and a path is. defined as a series of 
information resources connected through links: and 

navigating through the information resource topology. 

10 

18. The method of claim 17. wherein at the forming step, 
the Information resource topology with formed with each 
cluster sparsely connected. 

15 19. The method of claim 17. wherein at the forming step, 
the infbrmation resource topology is formed across multiple 
networked computer systems. 

20. The method of claim 17. wherein the navigating step Is 
20 carried out in each one of multiple networked computer 

systems. 

21. The method of claim 17. wherein at the forming step, 
the set of terms describing each information resource are 

25 given by any of keywords, title words, and author names 
related to each information resource. 

22. The method of claim 17. wherein at the forming step, 
the information resource topology is an approximated 

30 information resource topology in which not every cluster 
contains all of the information resources that contain the 
term combination associated with the information resources 
belonging to said every cluster. 



35 23. The method of claim 17. wherein the navigating step 



realizes a gathering: function to gather all Information 
resources that contain a given term combination when at 
least one Information resource containing said given term 
combination is known, by successively traversing the links 
5 from said at least one information resource containing said 
given term combination. 

24. The method of • claim 23, wherein the navigating step 
Includes the steps of: 

operating link servers, each for storing term 
combinations and links of the information resources, and 
answering queries about the stored term combinations by 
listing the links and the information resources associated 
with those stored term combinations that fully match 
queried term combinations; and 

operating a gather client for sending queries about a 
term combination to one link server, and thereby learning 
fully matching information resources and any additional 
relevant link servers which can be subsequently queried. 

25. The method of claim 17, wherein the navigating step 
realizes a searching function to search at least one 
Information resource that contains a given term combination 
by successively searching clusters with an increasing 
number of terms matching with said given term combination. 

26. The method of claim 25, wherein when no information 
resource with the term combination matching with said given 
term combination is found, the searching function finds a 

30 set of information resources that collectively contain as 
many of term sub-combinations of said given term 
combination as possible. 

27. The method of claim 25, wherein the navigating step 
35 includes the steps of: 
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operating search servers, each for storing term 
combinations and links of the Information resources, and 
answering queries about the stored term combinations by 
listing the links and the information resources associated 
5 with those stored term combinations that partially match 
queried term combinations: and 

operating a search client for sending queries about 
said given term combination to one search server, and 
thereby learning partially matching information resources 
10 and any additional relevant search servers which can be 
subsequently queried. 

28. The method of claim 17, wherein the navigating step 
realizes a topology managing function to manage the 

15 information resource topology. 

29. The method of claim 28. wherein the navigating step 
includes the step of: 

operating resource addition means for adding a new 
20 information resource to the information resource topology, 
by searching the information resource topology for 
information resources with term cpmbinatlons or sub- 
combinations that match those of the new Information 
resource and Issuing link add messages to matching 
25 information resources found by the searching. 

30. The method of claim 29. wherein the navigating step 
further includes the step of: 

operating link addition means for adding links from 
30 the new information resource to the matching information 
resources informed by link add messages from the resource 
addition means. 

31. The method of claim 28. wherein the navigating step 
35 Includes the step of: 
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operating resource deletion means for deleting an 
Information resource from the Information resource topology 
by Issuing link delete messages to those information 
resources which are linked to the deleted Information 
5 resource. 

32. The method of claim 31, wherein the navigating step 
further Includes the step of: 

operating link deletion means for deleting links to 
10 the deleted information resource informed by link delete 
messages from the resource deletion means, by re-searching 
the information resource topology for information resources 
with term sub-combinations that match those of the deleted 
information resource. 

15 

33. An article of manufacture, comprising: 

a computer usable medium having computer readable 
program code means embodied therein for causing a computer 
to function as an information navigation system, the 

20 computer readable program means including: 

first computer readable program code means for causing 
the computer to form an information resource topology among 
information resources in which each information resource is 
associated with at least one term combination and a set of 

25 links, where each term combination specifies a set of terms 
describing each information resource and each link links 
information resources with matching term combinations, and 
for every existing term combination, a set of information 
resources that contain said every existing term combination 

30 form a cluster, where a cluster Is defined as a set of 
Information resources for which there exists at least one 
path between every pair of information resources in said 
set of information resources such that said at least one 
path contains only information 'resources from said set of 

35 information resources and a path is defined as a series of 
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information resources connected through links: and 

second computer readable proeram code means for 
causing the computer to navigate through the information 
resource topology. 

34. The article of manufacture of claim 33, wherein the 
first computer readable program code means forms the 
information resource topology with each cluster sparsely 
connected. 

35. The article of manufacture of claim 33, wherein the 
first computer readable program code means forms the 
information resource topology across multiple networked 
computer systems. 

36. The article of manufacture of claim 33, wherein the 
second computer readable program code means Is operated in 
each one of multiple networked computer systems. 

37. The article of manufacture of claim 33. wherein the 
first computer readable program code means forms the 
information resource topology using the set of terms 
describing each information resource which are given by any 
of keywords, title words, and author names related to each 

i information resource. 

38. The article of manufacture of claim 33. wherein the 
first computer readable program code means forms the 
information resource topology as art approximated 

) Information resource topology In which not every cluster 
contains all of the Information resources that contain the 
term combination associated with the information resources 
belpnging to said every cluster. 



35 39. The article of manufacture of claim 33, wherein the 



second computer readable program code means realizes a 
cratherlngr function to g^ather all Information resources that 
contain a elven term combination when at least one 
information resource containing said given term combiiiatlon 
5 is known, by successively traversing the links from said at 
- least one information resource containing said given term 
combination. 

40. The article of manufacture of claim 39, wherein the 
10 second computer readable program code means realizes the 

gathering function by: 

operating link servers, each for storing term 

combinations and links of the information resources, and 

answering queries about the stored term combinations by 
15 listing the links and the information resources associated 

with those stored term combinations that fully match 

queried term combinations; and 

operating a gather client for sending queries about a 

term combination to one link server, and thereby learning 
20 fully matching information resources and any additional 

relevant link servers which can be subsequently queried. 

41. The article of manufacture of claim 33, wherein the 
second computer readable program code means realizes a 

25 searching function to search at least one information 
resource that contains a given term combination by 
successively searching clusters with an increasing number 
of terms matching with said given term combination. 

30 42. The article of manufacture of claim 41, wherein when 
no information resource with the term combination matching 
with said given term combination is found, the searching 
function finds a set of information resources that 
collectively contain as many of term sub-combinations of 

35 said given term combination as possible. 
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43. The article of manufacture of claim 41, wherein the 
second computer readable program code means realizes the 
searching function by: 

5 operating search servers, each for storing term 

combinations and links of the information resources, and 
answering queries about the stored term combinations by 
listing the links and the Information resources associated 
with those stored term combinations that partially match 

10 queried term combinations; and 

operating a search client for sending queries about 
said given term combination to one search server, and 
thereby learning partially matching information resources 
and any additional relevant search servers which can be 

15 subsequently queried. 

44. The article of manufacture of claim 33. wherein the 
second computer readable program code means realizes a 
topology managing function to manage the information 

20 resource topology. 

45. The article of manufacture of claim 44, wherein the 
second computer readable program code means realizes the 
topology managing function by: 

25 operating resource addition means for adding a new 

information resource to the information resource topology, 
by searching the Information resource topology for 
information resources with term combinations or sub- 
combinations that match those of the new information 

30 resource and issuing link add messages to matching 
Information resources found by the searching. 

46. The article of manufacture of claim 45, wherein the 
second computer readable program code means realizes the 

35 topology managing function by: 



-55- 



operating link addition means for adding links from 
the new information resource to the matching information 
resources Informed by link add messages from the resource 
addition means. 

5- 

47. The article of manufacture of claim 44. wherein the 
second computer readable program code means realizes the 
topology managing function by: 

operating resource deletion means for deleting an 
10 Information resource from the Information resource topology 
by issuing link delete messages to those information 
resources which are linked to the deleted information 
resource. 

15 48. The article of manufacture of claim 47, wherein the 
second computer readable program code means realizes the 
tolopoly managing function, by: 

operating link deletion means for deleting links to 
. the deleted information resource Informed by link delete 
20 messages from the resource deletion means, by re-searchlng 
the Information resource topology for information resources 
with term sub-combinations that match those of the deleted 
information resource. 

25 49. An article. of manufacture, comprising: 

a computer usable medltm having computer readable 
program code means embodied therein for causing a computer 
to function as an information navigation system for 
navigating through an information resource topology among 

30 information resources in which each information resource is 
associated with at least one term combination and a set of 
links, where each term combination specifies a set of terms 
describing each Information resource and each link links 
information resources with matching term combinations, and 

35 for every existing term combination, a set of Information 
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resources that contain said every existing term combination 
form a cluster, where a cluster is defined as a set of 
Information resources for which there exists at least one 
path between every pair of Information resources in said 
5 set of information resources such that said at least one 
path contains only information resources from said set of 
information resources and a path Is defined as a series of 
information resources connected through links, the computer 
readable program means including: 

10 first computer readable program code means for causing 

the computer to function as a link server for storing term 
combinations and links of the information resources, and 
answering queries about the stored term combinations by 
listing. the links and the information resources associated 

15 with those stored term combinations that fully match 
queried term combinations, so as to realize a gathering 
function to gather all Information resources that contain a 
given term combination when at least one information 
resource containing said given term combination is known, 

20 by successively traversing the links from said at least one 
Information resource containing said given term 
combination: 

second computer readable program code means for 
causing the computer to function as a search server for 

25 storing term combinations and links of the information 
resources, and answering queries about the stored term 
combinations by listing the links and the information 
resources associated with those stored term combinations 
that partially match queried term combinations, so as to 

30 realize a searching function to search at least one 

information resource that contains a given term combination 
by successively searching clusters with an increasing 
number of terms matching with said given term combination: 
and 

35 third coniputer readable program code means for causing 
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the computer to function as a topology manager to manage 
the Information resource topology, 

50. The article of manufacture of claim 49. wherein the 
5 third computer readable program code means realizes the 

topology manager which includes a resource addition sub- 
system for adding a new information resource to the 
information resource topology, by searching the information 
resource topology for information resources with term 
10 combinations or sub-combinations that match those of the 
new information resource and Issuing link add messages to 
matching Information resources found by the searching. 

51. The article of manufacture of elalm 50, wherein the 
third computer readable program code means realizes the 
topology manager which further Includes a link addition 
sub-system for adding links from the new information 
resource to the matching information resources informed by 
link add messages from the resource addition sub-system. 

52. The article of manufacture of claim 49, wherein the 
third computer readable program code means realizes the 
topology manager which Includes a resource deletion sub- 
system for deleting an information resource from the 
information resource topology by Issuing link delete 
messages to those information resources which are linked to 
the deleted Information resource. 

53. The article of manufacture of claim 52, wherein the 
30 third computer readable program code means realizes the 

tolopoly manager which further Includes a link deletion 
sub-system for deleting links to the deleted Information 
resource Informed by link delete messages from the resource 
deletion sub-system, by re-searchlng the information 
35 resource topology for Information resources with term sub- 
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combinations that match those of the deleted Inforinatlon 
resource. 



54. An article of manufacture, comprising: 

a computer usable medium having computer readable 
program code means embodied therein for causing a computer 
to function as an Information navigation system for 
navigating through an information resource topology among 
information resources in which each information resource is 
associated with at least one term combination and a set of 
links, where each term combination specifies a set of terms 
describing each information resource and each link links 
information resources with matching term combinations, and 
for every existing term combination, a set of Information 
resources that contain said every existing term combination 
form a cluster, where a cluster is defined as a set of 
information resources for which there exists at least one 
path between every pair of information resources in said 
set of information resources such that said at least one 
path contains only information resources from said set of 
information resources and a path is defined as a series of 
information resources connected through links, the computer 
readable program means including: 

first computer readable program code means for causing 
25 the computer to function as a gather client for sending 
queries about a term combination to one link server, and 
thereby learning fully matching Information resources and 
any additional relevant link servers which can be 
subsequently queried, so as to realize a gathering function 
30 to gather all information resources that contain a given 
term combination when at least one information resource 
containing said given term combination is known, by 
successively traversing the links from said at least one 
information resource containing said given term 
35 combination; and 
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second computer readable program code means for 
causing the computer to function as a search client for 
sending queries about a given term combination to one 
search server, and thereby learning partially matching 
5 information resources and any additional relevant search 
servers which can be subsequently queried, so as to realize 
a searching function to search at least one information 
resource that contains said given term combination by 
successively searching clusters with an increasing number 
10 of terms matching with said given term combination. 

55. An information navigation system substantially as hereinbefore 
described with reference to the accompanying drawings - 

55. A method of information navigaticHi substantially as hereinbefore 
described with- reference to the' accompanying -drawiA^g. 
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